Tag

#speculative decoding

5 articles

Tencent Open-Sources AngelSpec: A Unified Training Framework for MTP and Block-Parallel Speculative Decoding on Hy3 Models

Tencent open-sources AngelSpec, a torch-native framework for speculative decoding that introduces DFly, a block-diffusion drafter, and achieves significant speedups in LLM inference.

Jul 304

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

Learn how DSpark, a new AI framework from DeepSeek, speeds up text generation in AI models by making smart guesses and verifying only when necessary, without sacrificing accuracy.

Jun 2745

DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel for Up to 15x Higher Throughput on NVIDIA Blackwell

Researchers at UC San Diego introduce DFlash, a new speculative decoding technique that drafts whole token blocks in parallel, achieving up to 15x throughput improvement on NVIDIA Blackwell.

Jun 2366

Meet EAGLE 3.1: The Speculative Decoding Algorithm That Fixes Attention Drift in LLM Inference

EAGLE 3.1, developed by the EAGLE team, vLLM, and TorchSpec, tackles attention drift in LLM inference, enhancing speculative decoding stability for production use.

May 2661

A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8× Rollout Generation Speedup at 8B and Projects 2.5× End-to-End Speedup at 235B

Learn how speculative decoding helps AI systems generate text faster without losing accuracy, using a fast guess-and-check method.

May 171